Masking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems
نویسنده
چکیده
This paper provides an overview of methods of masking microdata so that the data can be placed in public-use files. It divides the methods according to whether they have been demonstrated to provide analytic properties or not. For those methods that have been shown to provide one or two sets of analytic properties in the masked data, we indicate where the data may have limitations for most analyses and how re-identification might or can be performed. We cover several methods for producing synthetic data and possible computational extensions for better automating the creation of the underlying statistical models. We finish by providing background on analysis-specific and general information-loss metrics to stimulate research.
منابع مشابه
Producing Public-use Microdata That Are Analytically Valid and Confidential
A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more recor...
متن کاملRe-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata
Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau. A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same...
متن کاملRe-identification Methods for Masked Microdata
Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor ...
متن کاملRe-identification Risk in Swapped Microdata Release
Many government agencies, research organizations, healthcare providers, and others release data for public use. This data is used by the public and policy makers and serves an important role in society. In many cases however, the organizations releasing the data are also required to protect the privacy of the respondents. Hence, these organizations adopt procedures that attempt to prevent discl...
متن کاملExamples of Easy-to-implement, Widely Used Methods of Masking for which Analytic Properties are not Justified
This paper provides examples that illustrate the severe analytic distortions of many widely used masking methods that have been in use for a number of years. The masking methods are intended to reduce or eliminate re-identification risk in public-use files. Although the masking methods yield files that do not allow reproduction of the analytic properties of original, confidential files, in a nu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004